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Abstract 

*  One  of  the  major  obstacles  to  the  widespread  development  and  utiliza¬ 
tion  of  distributed  data  base  management  systems  is  the  lack  of  an  efficient 
recovery  technique.  A  methodology  is  present  here  for  recovery  of  distributed 
data  bases.  The  central  operation  of  the  recovery  technique  is  rollback  of 
a  data  base  application  task  on  the  processor  which  controls  access  to  the 
data.  The  rollback  procedure  restores  the  data  base  to  its  original  state 
prior  to  the  execution  of  the  application  task  and  determines  the  set  of 
application  tasks  which  may  have  been  effected  by  that  task.  Tasks  that  have 
not  operated  upon  data  altered  by  task  being  rolled  back  are  not  affected  by 
the  procedure.  The  rollback  procedure  attempts  to  minimize  the  time  and 

space  requirements  for  recovery,  ^ 
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1.  Introduce  Ion 


Distributed  data  base  management  systems  (DOBMS)  have  the  potential  to 
sake  a  significant  impact  on  the  way  data  is  viewed  and  processed.  The  ability 
to  easily  and  safely  access  data  controlled  by  several  different  computers  has 
been  a  long  standing  goal  of  workers  in  the  data  base  area.  Progress  toward 
this  goal  has  accelerated  rapidly  in  both  academic  and  industrial  environments. 
However,  many  problems  still  require  efficient  solutions.  One  of  these  problem 
-areas  that  can  have  a  significant  performance  impact  is  rollback  and  recovery, 
which  Is  the  topic  of  this  paper. 

1.1  Distributed  Data  Base  Management  Systems 

Many  forms  of  distributed  data  base  management  systems  have  been 
ecudied  [1-7].  All  distributed  data  base  configurations  can  be  described  as 
a  collection  of  host,  back-end,  or  bi-functional  machines.  A  host  machine  is 
a  processor  which  executes  data  base  application  tasks.  The  data  base  operations 
requested  by  a  host  machine  are  then  performed  by  a  back-end  machine.  A 
bi-functional  machine  contains  software  to  carry  out  both  the  host  and  back-end 
functions.  Figure  1  illustrates  a  DDBMS  with  host,  back-end,  and  bi-functional 
processors. 

1.2  Recovery  in  a  DBMS 

The  recovery  mechanism  is  essentially  the  same  in  all  single  machine 
data  base  management  systems.  During  the  execution  of  the  data  base  application 
tasks,  all  data  base  requests  are  logged  on  a  Journal  file  (usually  a  tape). 
Whenever  a  data  base  request  results  in  a  modification  to  the  data  base,  the 
effected  pages  are  written  onto  the  Journal  file  in  both  before  and/or  after 
Images.  If  a  point  of  inactivity  is  reached  in  the  processing  of  the  data  base, 
a  checkpoint  dump  of  DBMS] primary  memory  is  taken.  All  journal  and  checkpoint 
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Information  must  be  time  stamped  to  insure  proper  sequencing  of  the  recovery 
operation,  which  is  implied  by  the  relative  position  of  data  in  the  single 
machine  case. 

If  an  application  task  should  terminate  abnormally,  its  effect  on  the 
data  base  must  be  removed.  It  is  possible  in  an  Interactive  environment  that 
data  written  by  the  terminating  application  Cask  may  have  been  used  by  other 
tasks  which  in  turn  performed  additional  modifications  to  the  data  base. 

This  phenomenon  of  an  application  task  using  Incorrect  data  to  produce  additional 
Incorrect  data  can  have  a  cascading  effect  throughout  the  data  base  with  a 
large  number  of  application  tasks  potentially  operating  with  invalid  Information. 
Currently,  most  DBMS  users  avoid  this  problem  by  doing  updates  in  a  batch  mode 
*1  a  specified  tine  and  allowing  only  interactive  query. 

In  order  to  undo  the  effect  of  an  abnormally  terminating  task  on  the 
data  base,  a  rollback  procedure  is  necessary.  The  rollback  procedure  restores 
the  data  base  to  its  state  prior  to  the  Initiation  of  the  erroneous  application 
task.  The  precise  form  of  a  rollback  procedure  varies  with  the  organization 
and  environment  of  the  DBMS. 

In  the  simplest  case  if  the  run  time  unit  of  the  DBMS  is  single  threaded 
and  operates  in  a  batch  environment,  then  the  data  base  can  be  restored  from 
tr.e  checkpoint  taken  prior  to  the  initiation  of  the  task. 

In  the  more  complex  case  if  the  run  time  unit  of  the  DBMS  is  multi¬ 
threaded  and  operates  both  on-line  update  and  batch  tasks  simultaneously,  then 
one  of  two  basic  approaches  is  possible.  The  simpler  approach  is  to  roll  back 
all  tasks  which  either  have  executed  or  are  executing  to  the  time  of  initiation 
oi  the  faulty  tnsx.  If  this  procedure  is  followed,  the  integrity  of  all  portions 
of  the  aata  base  will  be  preserved,  but  at  a  potentially  high  cost,  since  it  is 
possible  that  tasks  which  have  no  interaction  with  the  terminating  task  may  be 
rolled  back  needlessly.  The  more  difficult  approach  suggests  that  only  processes 
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which  themselves  have  used  contaminated  data  be  rolled  back.  In  most  cases 
this  would  Involve  both  fewer  tasks  and  less  data.  In  either  ease,  the  roll¬ 
back  procedure  has  the  following  basic  steps: 

1.  The  data  base  Is  restored  to  the  first  checkpoint  after  the  Initiation 
of  the  faulty  task  (if  a  checkpoint  has  been  taken  in  that  time  frame). 

*  2.  The  page  images  prior  to  each  operation  from  the  checkpoint  to  task 

initiation  are  written  back  to  the  data  base.  It  should  be  noted  that 
Che  page  images  are  applied  in  reverse  order  (hence,  the  term  rollback). 

Either  approach  to  the  rollback  of  an  Incorrect  application  program 

will  maintain  data  base  integrity  at  the  cost  of  halting  some  or  all  data  base 

activity.  While  the  integrity  of  the  data  base  is  preserved,  there  can  be  no 

guarantee  that  any  erroneous  information  obtained  by  a  user  from  an  application 

task  Chat  terminated  between  Che  initiation  of  the  faulty  application  task  and 

the  beginning  of  the  rollback  procedure  can  be  corrected. 

2.  The  Problem  of  Recovery  and  the  DDBMS 

The  distribution  of  a  data  base  system  over  several  processors  Increases 
the  complexity  of  the  recovery  problem.  Just  the  interprocessor  communications 
overhead  can  result  in  more  time-consuming  rollback  operations  if  several 
processors  are  required  to  participate.  It  is  also  likely  to  be  the  case  that 
within  a  DDBMS  there  is  a  larger  number  of  programs  interacting  than  in  a  single 
(machine  system.  Hence,  the  complexity  and  effect  of  any  rollback  procedure  may 
ne  compounded. 

7.1  Possible  DDBMS  Recovery  Alternatives 

There  are  three  basic  approaches  to  recovery  in  a  DDBMS. 

1.  Design  the  data  base  to  permit  only  controlled  or  restricted  interaction 
among  application  tasks. 

2.  Extend  the  single  machine  recovery  mechanism  to  the  distributed  system. 
This  approach  would  entail  rolling  back  all  data  bases  on  all  back-end 
processors  in  the  system  to  the  point  of  initiation  of  the  faulty  Cask. 
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3.  Use  a  selective  recovery  mechanism  to  roll  back  only  those  tasks 
which  have  used  data  provided  by  the  erroneous  or  terminated  task. 

2.2  Analysis  of  Alternatives 

The  three  approaches  to  t-.  covery  in  a  distributed  DBMS  that  are  listed 
abpve  represent  widely  differing  philosophies.  Each  approach  has  merit  for 
specific  types  of  organizations  and  data  processing  facilities.  The  criteria 
used  here  to  evaluate  these  alternatives  are  based  upon  the  recovery  procedure's 
‘  effects  upon  system  throughput,  as  well  as  data  availability  and  the  degree 
of  data  integrity  maintained  vjithin  the  system. 

The  first  approach,  avoiding  integration  in  the  data  base,  has  little 
appeal  to  the  designer  of  a  DDBMS,  although  in  practice  this  might  be  the  most 
commonly  used  technique  since  many  users  of  data  base  systems  on  single  machines 
feel  that  avoiding  Integration  is  the  only  reliable  means  of  insuring  data 
integrity.  If  this  user  philosophy  were  applied  to  a  DDBMS,  very  inefficient 
utilization  of  the  distributed  system  would  result.  In  order  to  avoid  the  need 
for  a  sophisticated  recovery  mechanism  in  a  DDBMS,  two  tasks  would  not  be  per¬ 
mitted  to  have  simultaneous  update  capabilities  to  the  same  data  base.  For  an 
application  system  under  this  requirement,  the  DDBMS  recovery  problem  can  be 
simplified.  An  on-line  inquiry  batch  update  system  would  fall  in  this  category. 

If  the  single  machine  approach  is  extended  to  a  DDBMS,  then  the  entire 
data  base  would  be  unavailable  during  recovery.  In  a  system  with  a  large 
number  of  back-end  or  bi-functional  processors,  this  approach  could  result  in 
the  recovery  of  a  small  portion  of  a  data  base  preventing  usage  of  a  large, 
correct  segment  of  the  data.  The  communication  process  involved  with  this 
technique  is  that  a  message  must  be  transmitted  to  all  back-end  processors 
indicating  that  recovery  must  begin  and  the  time  of  initiation  of  the  faulty 
program.  This  system-wide  recovery  approach  insures  that  all  effects  of  the 
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erroneous  application  program  have  been  removed  from  the  data  base.  The 
main  drawbacks  to  this  method  are  that  access  to  unaffected  portions  of  the 
data  base  Is  prevented  and  that  unnecessary  rollbacks  may  occur. 

A  selective  recovery  mechanism  would  overcome  the  main  deficiency  of 
the  system-wide  recovery  strategy  by  rolling  back  only  those  application 
tasks  that  are  operating  with  tainted  data.  Overall  system  throughput  would 
increase  under  these  circumstances,  as  would  accessibility  to  the  data  base. 

In  order  for  a  selective  recovery  scheme  to  be  worthwhile,  data  base 
integrity  must  be  maintained.  Therefore,  the  schema  must  be  certain  of 
Including  all  tasks  that  have  used  incorrect  data  in  the  recovery  process. 

In  order  to  accomplish  selective  rollback.  Information  on  the  Interaction 
between  application  tasks  must  be  maintained.  This  interaction  information 
would  take  the  form  of  a  potential  shared  data  list  which  can  be  computed 
from  the  sub-schemas  of  the  application  tasks  prior  to  execution. 

The  communication  overhead  which  potentially  has  the  most  significant 
performance  effect  in  a  DDBMS  must  not  exceed  one  transmission  to  each  back¬ 
end  processor  if  the  method  is  to  be  effective.  The  computational  overhead 
Involved  in  a  selective  recovery  strategy  must  be  maintained  at  a  level 
where  it  Is  not  significant  in  terms  of  system  performance.  A  selective 
.recovery  mechanism  which  satisfies  the  performance  requirements  listed  above 
has  the  potential  for  better  performance  in  a  DDBMS  than  the  recovery  schemes 
•discussed  previously. 

In  dlscrlbuted  data  base  systems  which  are  highly  integrated  and  support 
suit i-threaded  updating,  a  selective  recovery  technique  is  required  if  both 
performance  and  integrity  are  to  be  preserved. 

i.  A  Selective  Recovery  Methodology 


The  following  sections  explain  a  procedure  which  meets  the  necessary 
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criteria  for  selective  rollback  as  described  above.  In  addition,  some 
comments  are  made  concerning  the  role  of  the  DBA  with  respect  to  a  DDBHS 
system.  The  procedure  is  oriented  toward  a  CODASYL-typo  DBMS,  although 
the  same  general  techniques  are  applicable  to  any  type  of  data  base  system. 

3.1  Definition-Time  Recovery  Processing 

A a  previously  mentioned,  the  proposed  selective  recovery  methodology 
for  distributed  data  base  systems  requires  processing  at  data  definition 
time,  as  well  as  run  time.  When  a  new  sub-schema  is  created,  a  potential 
shared  data  list  is  computed  b^  intersecting  all  sub-schemas  of  that  schema. 

The  potential  shared  data  list  Indicates  record  and  set  types  that  are  in 
common  with  other  sub-schemas. 

Since  each  application  task  invokes  exactly  one  'sub-schema  during  its 
execution,  the  potential  data  overlap  of  any  two  application  tasks  can  be 
determined  from  their  potential  shared  data  lists.  In  order  to  maintain 
the  data  overlap  information  at  execution  time,  the  activation  of  a  data 
base  application  task  must  result  in  a  message  being  transmitted  to  all 
applications  that  have  the  potential  to  share  data  with  that  task.  The 
message  indicates  the  application  task  and  sub-schema  names.  The  information 
relating  active  application  tasks  and  sub-schemas  is  maintained  in  the  data 
dictionary.  When  an  application  task  terminates,  a  similar  message  is  sent 
to  all  tasks  with  Intersecting  potential  shared  data  lists. 

Situations  may  arise  in  integrated  data  bases  i  dddh  application 
tasks  share  record  types,  but  do  not  operate  upon  the  aontemts  of  all  data 
items  In  the  record.  A  similar  possibility  is  that  the  application  task  may 
access  some  data  items  in  a  read-and-print  mode.  In  either  of  these  situations, 
an  incorrect  value  in  a  data  item  may  not  be  critical  to  the  function  of  the 


program.  Rolling  back  a  Cask  due  to  an  incorrect  value  in  a  non-crltlcal 
data  1 tea  would  have  an  adverse  effect  on  system  performance. 

The  identity  of  non-crltical  data  Items  Is  heavily  application  task 
dependent  and  can  in  no  way  be  Inferred  from  a  sub-schema  description.  Since 
the  CODAS YL  specifications  [8]  do  not  provide  for  the  identifications  of  non- 
critlcal  data  items  for  recovery  purposes,  some  additional  mechanism  for  their 
identifications  must  be  provided  to  the  data  base  administrator.  The  simplest 
approach  would  be  to  maintain  in  Che  data  dictionary  a  list  of  non-crltlcal 
data  Items  for  each  application  task. 

When  sub-schemas  are  intersected  to  form  the  potential  shared  record 
Use,  non-critical  data  items  should  be  removed  from  the  intersection  of  the 
records.  Only  those  records  which  intersect  on  critical  data  items  should  be 
Included  in  the  potential  shared  data  list.  Figure  2  illustrates  the  potential 
wiared  data  list  for  a  sample  application  task,  A. 

'  ■  2  Logging 

The  functional  back-end  processor  controls  data  base  access  and,  therefore, 
is  the  appropriate  location  to  maintain  the  logs  of  data  base  activity.  Since 
•  back-end  processor  (or  a  bi-functional  machine  serving  as  a  back-end)  may 
«erve  a  large  number  of  application  tasks,  the  amount  of  recovery  information 
..  it. c.  be  minimized.  Many  existing  single  machine  data  base  systems  save  the 
>2fore  and  after  images  of  each  page  that  is  altered  as  a  result  of  a  DML 
.tacement.  For  example,  in  a  CODASYL  DBMS,  changes  to  met  structure  can  affect 
many  pages  and  consequently  consume  a  large  amount  of  file  space.  In  order  to 
Quicc  the  amount  of  log  file  space  required,  an  inverse  DKL  rollback  technique 
Ue  used.  The  inverse  technique  requires  little  primary  memory,  since  for 
,.ie  most  part  existing  DML  functions  can  be  used  to  perform  the  rollback 
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Sample  Potential  Shared  Data  List 
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operations.  Table  1  lists  the  CODASYL  DML  verbs,  the  lnforaatlon  that  a 
back-end  processor  nust  save  to  roll  back  that  verb,  and  the  inverse  actions. 

Whenever  a  DML  verb  Is  executed  by  a  back-end  processor,  a  log  file 
entry  is  written.  Figure  3a  depicts  the  format  of  a  log  file  entry  for  a 
DML  verb.  Note  that  the  entries  tw:ve  variable  lengths.  In  addition^ to 
recording  the  DML  verbs  that  have  been  executed,  the  log  file  must  also 
contain  restart  information  on  all  application  tasks.  Restart  information 
identifies  a  stable  point  at  which  an  application  program  can  be  restarted. 

By  default,  the  system  will  write  a  restart  entry  whenever  an  application  task 
is  initiated.  It  Is  also  desirable  to  permit  the  programmer  the  ability  to 
Indicate  a  restart  point  in  a  task.  The  CODASYL  specifications  do  not  provide 
a  facility  for  this  operation.  However,  UNIVAC's  DMS-1100  has  a  similar  feature 
in  the  LOG  verb  [9).  The  format  of  a  restart  entry  for  the  log  file  is  shown 
in  Figure  3b. 

Figure  4  gives  some  sample  DML  commands  and  their  resultant  log  file 
entries. 

In  certain  cases,  particularly  a  strict  retrieval  environment,  the 
logging  of  read  operations  may  produce  a  large  number  of  entries  on  the  log 
file  that  will  never  be  applied  in  any  rollback  situation.  Considerable  roll¬ 
back  overhead  could  be  saved  if  the  data  base  administrator  were  provided  the 
option  of  shutting  off  the  logging  facility  for  retrieval  operations  on  selec¬ 
tive  tasks.  This  could  be  accomplished  at  task  Initiation  time. 

;. 3  Rollback 

When  the  DBMS  software  on  a  host  processor  determines  i^hat  an  application 
cask  has  terminated  abnormally,  rollback  procedures  are  initiated.  The  host 
processor  notifies  all  back-end  processors  for  this  task  that  rollback  must 
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DHL 

Verb 

Effect  on 

Data  Base 

Recovery 

Information 

Recovery 

Action 

ACCEPT 

None  - 

M/A 

M/A 

CONNECT 

Changes  Pointers 

Record  f>  Set  Names 

DISCONNECT 

'  DISCONNECT 

Changes  Pointers 

Record  &  Set  Names 

CONNECT 

ERASE 

Removes  Record 
and  Members 

Images  of  All  Erased 
Records,  Set  Memberships 

All  Records 

FIND 

None 

M/A 

N/A 

FINISH 

None 

• 

N/A 

N/A 

FREE 

None 

M/A 

N/A 

GET 

None 

N/A 

N/A 

KEEP 

None 

M/A 

N/A 

MODIFY 

Changes  Record 

Contents 

Old  Record  Image 

MODIFY 

ORDER 

Changes  Pointers 

Set  Pointers  in  Old 

Order 

Reorder  Using 
Old  Pointers 

READY 

None 

N/A 

N/A 

STORE 

Changes  Record 

Concents  and  Pointers 

Record  Name 

ERASE 

REMONITOR 

None 

N/A 

N/A 

USE 

None 

N/A 

e 

N/A 

Table  1 


Recovery  Information  and  Action  for  ML  Verbs 
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DHL 
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OR 

OPERATION 
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SET 

ID 

TYPE 
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B 

T 

S 

A 

T 

S 

A 

K 

R 

T 

b.  Restart  Entry 
Figure  3 

Log  File  Entries 
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Beck- End  Processors  Lon  File 
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Command 

Task 

Time 

Occurrence 

Operation 

eo 

Iniclace  A 

A 

*0 

— 

RESTART 

*1 

Restart  Point-C 

C 

*1 

— 

RESTART 

*2 

A: 

STORE  r1 

A 

*2 

*1 

STORE 

*3 

C: 

GET  r2 

a 

C 

*3 

r2 

GET 

*4 

Initiate  B 

B 

*4 

— 

RESTART 

*5 

A: 

MODIFY  r2 

A 

*5 

r2 

MODIFY 

*6 

A: 

MODIFY  T1 

A 

*6 

rl 

MODIFY 

*7 

Initiate  D 

D 

*7 

— 

RESTART 

*8 

D: 

GET  rx 

D 

*» 

rl 

GET 

*9 

B: 

GET  r2 

B 

Ft 

XO 

r2 

GET 

*10 

B: 

GET  r2 

B 

*10 

r2 

GET 

Figure  4 


Staple  Log  File 


.  a 


occur  and  provides  the  task  name,  Its  Initiation  time.  Its  potential  shared 
data  list,  and  a  list  of  its  areas  which  are  open  for  update  as  parameters. 

The  method  for  identifying  the  back-end  processors  for  a  given  task  is  explained 
in  Reference  (10].  In  the  ensuing  presentation,  this  task  shall  be  known  as 
the  primary  rollback  task. 

Each  back-end  processor  chat  receives  the  rollback  message  must  refuse  to 
accept  any  DML  operations  accessing  the  areas  updated  by  the  terminating  cask. 
T'ie  following  rollback  procedure  is  then  carried  out. 

1.  Read  the  log  file  backwards  to  locate  the  initiation  point  of  the  task. 

2.  Read  the  log  file  forward  from  that  point  and  perform  the  following 
operations,  depending  upon  the  entry  on  the  log  file: 

a.  If  the  entry  is  an  update  entry  for  the  task  that  is  being  rolled 
back,  make  an  entry  in  the  update  list  which  indicates  the  record  or  set 
occurrence  whose  contents  have  been  modified.  An  entry  in  the  update  list 
consists  of  an  ordered  pair  of  record  or  set  type  and  occurrence  indentifiers. 
Copy  this  log  file  entTy  onto  the  rollback  file. 

b.  If  a  MODIFY  entry  for  the  task  being  rolled  back  alters  the  contents 
ot  a  record  occurrence  that  was  previously  MODIFIED  (this  is  indicated  by  the 
presence  of  the  record  occurrence  on  the  update  list),  no  action  is  taken  with 


.  espect  to  either  the| update  list  or  rollback  file.  This  will  insure  that  an 
dated  record  is  restored  only  once  to  its  earliest  value  in  the  rollback 


procedure. 


c.  If  an  entry  for  an  application  task  other  than  the  primary  roll¬ 
back  task  references  a  record  or  set  occurrence  for  which  an  update  list  entry 

and  an  entry  for  that  task  and  record  or  set  type  exists  in  the  potential 


(•hared  data  list,  that  task  must  also  be  rolled  back.  Rollback  of  this  task 
just  occur  since  the  task  may  be  operating  with  Incorrect  data.  The  task  name 
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and  time  of  this  entry  ore  saved  in  a  secondary  rollback  list. 

d.  All  entries  on  the  log  file  for  tasks  in  the  secondary  rollback 
list  are  ignored. 

«.  All  entries  on  the  rollback  file  for  tasks  that  do  not  have  entries 
in  the  potential ^shared  data  list  of  the  priaary  rollback  task  are  Ignored. 

f.  All  entries  for  non-update  DHL  commands  that  operate  on  record  or 
or  set  occurrences  not  in  the  update  list  are  Ignored. 

g.  Any  primary  rollback  task  retrieval  entries  on  the  log  file  are 

Ignored . 

3.  When  the  log  file  has  been  processed  up  to  the  time  of  termination, 
the  restoration  of  the  portions  of  the  data  base  effected  by  the  priaary 
rollback  task  is  performed  by  processing  the  rollback  file  la  reverse  and 
executing  the  rollback  actions  indicated  in  Table  1. 

4.  The  log  file  is  read  backwards  in  order  to  locate,  for  each  task  in 
the  secondary  rollback  list,  the  restart  point  immediately  preceding  the  time 
at  which  the  incorrect  operation  was  detected. 

5.  Messages  are  transmitted  to  the  host  processors  for  the  tasks  in  the 
secondary  rollback  list,  indicating  that  the  tasks  should  be  rolled  back  to 
the  specified  restart  point.  The  host  processors  will  then  suspend  those 
application  tasks  and  send  the  appropriate  rollback  information  to  the  back¬ 
end  processors  for  those  tasks. 

Figure  5  illustrates  the  resulting  rollback  file, update  list, and  secondary 
rollback  list  if  the  sample  application  task  A  of  Figure  4  were  to  be  rolled 
back.  The  potential  shared  data  list  of  Figure  2  is  assumed  in  this  example. 

Careful  observation  of  Figures  2,  4,  and  5  will  lead  to  the  following 

conclusions : 

1.  C  is  not  rolled  back  since  it  accesses  r ^  before  A  updates  rj. 
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Rollback  File  (Operations) 
STORE  rx 

MODIFY  r2 


Update  List 


Actions 

1.  Task  A  will  be  rolled 
back  to  time  tQ. 

2.  will  be  removed  from 
the  data  base. 

3.  r2  will  be  restored  to 
its  status  before  time  t^. 


Task 

Time 

) 

t  B 

i _ 

Figure  5 

Results  of  Rolling  Back  Task  A 
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2.  D  is  not  rolled  back  since  Is  not  In  the  potential  shared  data 
list  (l.e.  it  Is  not  a  critical  record  for  D). 

3.  The  MODIFY  r.(  command  is  not  rolled  back  aince  It  is  preceded  by  a 
STORE  r^  command. 

4.  Although  it  is  not  explicitly  shown  in  the  figures,  the  GET  r^ 

« 

command  at  time  t^  will  be  Ignored  during  rollback  processing,  since  B  will 
have  been  added  to  the  secondary  rollback  list  after  processing  the  log  file 
entry  for  time  t^ . 

3.4  Responsibilities  of  Data  Base  Administrator 

The  distribution  of  a  DBMS  over  a  computer  network  enormously  complicates 
the  data  base  administration  function.  If  a  recovery  scheme  similar  to  that 
proposed  in  this  paper  is  implemented,  the  DBA  must  make  decisions  that  will 
have  substantial  Impact  on  the  time  required  for  recovery.  Perhaps  the  most 
important  factors  are  the  distribution  of  the  data  across  machines  and  the 
amount  of  data  integration.  As  distribution  and  integration  of  data  Increase, 
the  data  base  becomes  more  accessible  to  a  larger  number  of  users.  At  the 
same  time,  recovery  operations  become  increasingly  complex.  This  quandry 
reduces  to  the  classical  data  processing  tradeoff  between  flexibility  and 
efficiency.  It  is  the  role  of  the  data  base  administrator  to  balance  these 
two  seemingly  conflicting  factors. 

4.  Conclusion 

The  rollback  of  an  application  task  by  a  back-end  processor  is  the  central 
element  in  any  recovery  scheme  for  a  distributed  data  base  management  system. 

In  a  highly  Integrated  data  base,  the  procedure  may  have  to  be  repeated  several 
times  until  no  new  secondary  rollback  lists  are  generated.  Rollback  can  be 
executed  concurrently  on  distinct  back-end  processors.  However,  a  back-end 
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processor  con  rollback  only  one  application  task  at  a  tine.  It  la  important 

to  note  that  the  recovery  procedure  docs  not  allow  for  redundant  copies  of  data 

at  different  back-end  processors.  Although  redundancy  Is  unneceasary  in  a 

truly  distributed  DBMS,  there  are  certain  circumstances  in  which  practical 

considerations  may  justify  redundancy  [1,3} .  As  in  all  other  situations  of 
<• 

this  nature,  the  DBA  must  decide  on  redundancy. 

The  rollback  procedure  presented. in  this  paper  provides  a  mechanism 
lor  efficient  and  complete  recovery  in  a  distributed  data  base  management 
system.  The  development  of  an  efficient  recovery  mechanism  can  have  a  signi- 
f leant  impact  on  the  design  and  usage  of  distributed  data  base  systems.  One 
important  design  aspect  that  recovery  can  impact  is  deadlock  handling.  Due 
tu  potential  inefficiencies  in  recovery  of  distributed  data  bases,  it  has 
beer;  argued  that  deadlock  prevention  is  more  efficient  than  deadlock  detection 
tor  a  distributed  DBMS  [11].  However,  an  efficient  recovery  mechanism  can 
make  deadlock  detection  more  attractive. 

An  efficient  and  complete  recovery  technique  could  provide  data  base  users 
with  sufficient  confidence  in  a  DBMS  to  allow  the  formulation  of  integrated, 
distributed  data  bases.  Without  that  confidence,  the  progress  in  distributed 
data  base  systems  will  be  severely  Impeded  by  lack  of  acceptance  in  the  data 
processing  environment.  The  recovery  procedure  presented  in  this  report  is  a 
seep  coward  developing  that  user  confidence  in  data  base  systems. 
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